Hereditary Disease Discovery from a Clinical Data Warehouse

نویسندگان

  • Hong Yu
  • George Hripcsak
چکیده

Introduction: Hereditary disease discovery and pattern recognition are normally obtained through a large-scale and detailed family health history. However, it is difficult to obtain a large scale, well-orchestrated family health history. Here we try a new method for hereditary disease and pattern discovery without obtaining family health history. Background and Rationale: Columbia Presbyterian Medical Center (CPMC) has the Columbia Data Warehouse (CDW) containing more than 6 million patients' electronic medical records since 1979. The data warehouse schema contains not only patients’ MRN (medical record number) and diagnosis, but also first name, last name, and address (street, city, country, and zip code). The rationale is that hereditary diseases occur more often among family members than non-hereditary diseases. Transmittable diseases usually occur more frequently when people live together. Close relatives usually share the same last name. Thus, the hypothesis is that we might obtain higher number of patients who share last names, and addresses in patients with hereditary diseases and transmittable diseases, respectively, than in randomly selected patients in CDW. Methods: We obtained patients' last name, first name, street, city, state, and zip as five attributes for each of the following well known genetic diseases and transmittable disease from the data warehouse: asthma, breast cancer, diabetes (type II), hemophilia A, Duchenne's muscular dystrophy, Huntington's disease and tuberculosis. We retrieved all patients with appendicitis in CDW and all the patients (as random patients) in CDW from January 1 to March 31 of 1995 as the control. For each query, we obtained the total number of retrieved entries. If the number of entries was high (more than 170), we randomly selected 170 entries out of the total entries repeatedly (100 times) and obtained the average number of patients who share one of the following attribute(s) with one other patient: last name, zip code, address (including street, city, country, and zip), last name and zip, last name and address. If the total number of entries was less than 170, we obtained the numbers of patients sharing the above attributes from the total retrieved entries. In addition, we randomly selected (repeatedly) equal number of entries in the control and obtained the number of patients sharing the attributes. Results and Discussion: The result (table 1) shows that asthma, diabetes (II), tuberculosis, and hemophilia A have a much higher percentage of patients sharing last names than the random selected patients. Huntington's disease has a higher percentage of patients sharing addresses than all the other diseases. Tuberculosis has a slightly higher percentage of patients sharing the addresses than the others. Interestingly, breast cancer and Duchenne's muscular dystrophy have lower percentage of patients sharing either address or last name. On the other hand, appendicitis has a higher percentage of patients sharing last names than the others. Thus, in general, our result does not support the hypothesis. The reasons could be the following: the statistics that the family members go to the same hospital could be low. The selected genetic diseases are rare in the population (less than 100 patients for hemophilia A and Huntington's disease). In addition, CDW only contains electronic medical records in a period of 20 years. Many hereditary diseases need a long time to reoccur in a family tree. Acknowledgements: This work was supported by National Library Medicine grants LM05627 "Linking Knowledge-Based Systems to Clinical Databases" and LM07079 "Research Training Grant". Reference: 1. Evan S, Lemon SJ, Deters CA, Fusaro RM, Lynch HT (1997), Automated detection of hereditary syndromes using data mining, Comp Bio Res 30: 337-48. 2. McKusick VA (1998), Mendelian Inheritance in Man, a catalog of human genes and genetic disorders, 12 edition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه مدل تلفیقی برای ارزیابی آمادگی سازمان ها جهت پیاده سازی سیستم انباره داده با استفاده ازتحلیل سلسله مراتبی

Enterprise Data Warehouse initiative is a high investment project. The adoption of Data Warehouse will be significantly different depending upon the level of readiness of an organization. Before implementation of Data Warehouse system in a firm, it is necessary to evaluate the level of the readiness of firm. A successful Data Warehouse assessment model requires a deep understanding of opportuni...

متن کامل

An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework

Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We ha...

متن کامل

Survey on Perception of People Regarding Utilization of Computer Science & Information Technology in Manipulation of Big Data, Disease Detection & Drug Discovery

this research explores the manipulation of biomedical big data and diseases detection using automated computing mechanisms. As efficient and cost effective way to discover disease and drug is important for a society so computer aided automated system is a must. This paper aims to understand the importance of computer aided automated system among the people. The analysis result from collected da...

متن کامل

Semi-automatic Discovery of Mappings Between Heterogeneous Data Warehouse Dimensions

Data Warehousing is the main Business Intelligence instrument for the analysis of large amounts of data. It permits the extraction of relevant information for decision making processes inside organizations. Given the great diffusion of Data Warehouses, there is an increasing need to integrate information coming from independent Data Warehouses or from independently developed data marts in the s...

متن کامل

Data Warehouse Design for Pharmaceutical Drug Discovery Research

Pharmaceutical companies spend billions of dollars annually on drug discovery research. In the process, they generate vast amounts of scientific data. Data warehousing could significantly shorten the research cycle that leads to a new drug. We propose a framework for the application of data warehousing to integrate a pharmaceutical company’s drug discovery data. We provide an analysis of the pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000